Lecture 9 : ( Semi - ) bandits and experts with linear costs ( part I )

نویسندگان

  • Alex Slivkins
  • Amr Sharaf
چکیده

In this lecture, we will study bandit problems with linear costs. In this setting, actions are represented by vectors in a low-dimensional real space. For simplicity, we will assume that all actions lie within a unit hypercube: a ∈ [0, 1]d. The action costs ct(a) are linear in the vector a, namely: ct(a) = a · vt for some weight vector vt ∈ Rd which is the same for all actions, but depends on the current time step. This problem is useful and challenging under full feedback as well as under bandit feedback; further, we will consider an intermediate regime called semi-bandit feedback. The plan for today is as follows:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lecture 7 : Full feedback and adversarial rewards ( part I )

A real-life example is the investment problem. Each morning, we choose a stock to invest. At the end of the day, we observe not only the price of our target stock but prices of all stocks. Based on this kind of “full“ feedback, we determine which stock to invest for the next day. A motivating special case of “bandits with full feedback” can be framed as a question-answering problem with experts...

متن کامل

G : Bandits , Experts and Games 10 / 10 / 16 Lecture 6 : Lipschitz Bandits

Motivation: similarity between arms. In various bandit problems, we may have information on similarity between arms, in the sense that ‘similar’ arms have similar expected rewards. For example, arms can correspond to “items” (e.g., documents) with feature vectors, and similarity can be expressed as some notion of distance between feature vectors. Another example would be the dynamic pricing pro...

متن کامل

G : Bandits , Experts and Games 09 / 12 / 16 Lecture 4 : Lower Bounds ( ending ) ; Thompson Sampling

Here is a parameter to be adjusted in the analysis. Recall that K is the number of arms. We considered a “bandits with predictions” problem, and proved that it is impossible to make an accurate prediction with high probability if the time horizon is too small, regardless of what bandit algorithm we use to explore and make the prediction. In fact, we proved it for at least a third of problem ins...

متن کامل

CSC 2411 - Linear Programming and Combinatorial Optimization ∗ Lecture 9 : Semi - Definite Programming Combinatorial Optimization

This lecture consists of two main parts. In the first one, we revisit Semi-Definite Programming (SDP). We show its equivalence to Vector Programming, we prove it has efficient membership and separation oracles and finally state a theorem that shows why Ellipsoid can be used to acquire an approximate solution of a semi-definite program. In the second part, we make a first approach to Combinatori...

متن کامل

Lecture 9 : Linear Bandits ( Part II )

There exist an elliptical confidence region for the w, as described in the following theorem Theorem 1. ([2], Theorem 2) Assuming ‖w‖ ≤ √ d and ‖xt‖ ≤ √ d, with probably 1− δ, we have w ∈ Ct, where Ct = { z : ‖z − ŵt‖Mt ≤ 2 √ d log Td δ } For any x ∈ A, we define UCBx,t = maxz∈Ct z′x if w ∈ Ct (which holds with high probability). At each time, the UCB algorithm then simply picks the bandit with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016